Cavity-aware motifs reduce false positives in protein function prediction.

نویسندگان

  • Brian Y Chen
  • Drew H Bryant
  • Viacheslav Y Fofanov
  • David M Kristensen
  • Amanda E Cruess
  • Marek Kimmel
  • Olivier Lichtarge
  • Lydia E Kavraki
چکیده

Determining the function of proteins is a problem with immense practical impact on the identification of inhibition targets and the causes of side effects. Unfortunately, experimental determination of protein function is expensive and time consuming. For this reason, algorithms for computational function prediction have been developed to focus and accelerate this effort. These algorithms are comparison techniques which identify matches of geometric and chemical similarity between motifs, representing known functional sites, and substructures of functionally uncharacterized proteins (targets). Matches of statistically significant geometric and chemical similarity can identify targets with active sites cognate to the matching motif. Unfortunately statistically significant matches can include false positive matches to functionally unrelated proteins. We target this problem by presenting Cavity Aware Match Augmentation (CAMA), a technique which uses C-spheres to represent active clefts which must remain vacant for ligand binding. CAMA rejects matches to targets without similar binding volumes. On 18 sample motifs, we observed that introducing C-spheres eliminated 80% of false positive matches and maintained 87% of true positive matches found with identical motifs lacking C-spheres. Analyzing a range of C-sphere positions and sizes, we observed that some high-impact C- spheres eliminate more false positive matches than others. High-impact C-spheres can be detected with a geometric analysis we call Cavity Scaling, permitting us to refine our initial cavity-aware motifs to contain only high-impact C-spheres. In the absence of expert knowledge, Cavity Scaling can guide the design of cavity-aware motifs to eliminate many false positive matches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cavity Scaling: Automated Refinement of Cavity-Aware motifs in protein Function Prediction

Algorithms for geometric and chemical comparison of protein substructure can be useful for many applications in protein function prediction. These motif matching algorithms identify matches of geometric and chemical similarity between well-studied functional sites, motifs, and substructures of functionally uncharacterized proteins, targets. For the purpose of function prediction, the accuracy o...

متن کامل

Partitioning of Minimotifs Based on Function with Improved Prediction Accuracy

BACKGROUND Minimotifs are short contiguous peptide sequences in proteins that are known to have a function in at least one other protein. One of the principal limitations in minimotif prediction is that false positives limit the usefulness of this approach. As a step toward resolving this problem we have built, implemented, and tested a new data-driven algorithm that reduces false-positive pred...

متن کامل

Geometry - based Methods for Protein Function Prediction by Brian

The development of new and effective drugs is strongly affected by the need to identify drug targets and to reduce side effects. Unfortunately, resolving these issues depends partially on a broad and thorough understanding of the biological function of many proteins, and the experimental determination of protein function is expensive and time consuming. In response to this problem, algorithms f...

متن کامل

Prediction of High-throughput Protein-Protein Interactions and Calmodulin Binding Using Short Linear Motifs

Prediction of protein-protein interactions (PPIs) is a difficult and important problem in biology. Although high-throughput technologies have made remarkable progress, the predictions are often inaccurate and include high rates of both false positives and false negatives. In addition, prediction of Calmodulin Binding Proteins (CaM-binding) is a problem that has been investigated deeply, though ...

متن کامل

Relationship between Data Size and Accuracy of Prediction of Protein-Protein Interactions by Co-Evolutionary Information

The prediction of protein-protein interaction (PPI) with genomic information is an important issue of bioinformatics. Mirror tree is a method to predict PPIs by evaluating the similarity of the phylogenetic trees or distance matrices [1]. In this method, the intensity of the co-evolution between a pair of proteins is evaluated by Pearson's correlation coefficient between a pair of distance matr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational systems bioinformatics. Computational Systems Bioinformatics Conference

دوره   شماره 

صفحات  -

تاریخ انتشار 2006